Global highly parallel fast fourier transform processor

ABSTRACT

A fast Fourier transform processor and associated process wherein an input sequence of samples is broadcast to each of a plurality of parallel processing elements where sets of accumulated sums of products of these samples with appropriate trigonometric function values are maintained. These sets of accumulated sums are then individually Fourier transformed in parallel to form the Fourier coefficients corresponding to the original input sequence.

ca es-72 United States Patent [151 3,662,161 Bergland et al. 1 May 9,1972 GLOBAL HIGHLY PARALLEL FAST OTHER PUBLICATIONS FOURIER TRANSFORMPROCESSOR Bergland, FFl" Hardware lmplementations- An OverviewInventors; Glenn D. Bergland, Morris Township Trans. on Audio &Electroacoustics" June Morris County; Donald E. Wllson, Brookside, bothof NJ.

104- l08. Bergland-Wilson, A PET Algorithm for a Global, Highly ParallelProcessor IEEE Trans. on Audio & Elec- Assigneei Bell TelephoneLaboratories, Incorporated, troacoustics" Vol. AU- 17, No. 2 June 1969pp. l25- 127.

Murray Hill, Berkeley Heights, NJ. Pease, Organization of Large ScaleFourier Processors Journal of the Association for Computing Machinery,"Vol. 16, [22] 1969 No. 3, July 1969 pp. 474- 482. [21] Appl. No.:873,587

Primary Examiner-Malcolm A. Morrlson Assistant Examiner-David H. Malzahn2 444/1 AttorneyR. J. Guenther and William L. Keefauver [5|] Int. Cl...G06f 7/38, G06f /34 [58] Field of Search ..235/l56; 324/77 6; 340/155[57] ABSTRACT A fast Fourier transform processor and associated process[56] References cued wherein an input sequence of samples is broadcastto each of a plurality of parallel processing elements where sets ofaccu- UNITED STATES PATENTS mulated sums of products of these sampleswith appropriate 3,517,173 6/1970 Gilmartin, Jr. et al ..235/l56trigonometric function values are maintained- These sets of 3,544,77512/1970 Bergland et al ..235 151.31 accumulated Sums are thenindividually Fourier transformed in parallel to form the Fouriercoefficients corresponding to the original input sequence.

17 Claims, 13 Drawing Figures [I [O TWU'H sou RE 1 PROCESSING PROCESSINGPROCESSING ELEMENT ELEMENT ELEMENT PROGRAM MEMORY ENSEMBLE CONTROLPROCESSING 15? I UNIT MEMORY I Ll.

I 'PATENTEBIM 91912 SHEET 1 BF 6 FIG.

no INPUT SOURCE I 2o-| I20- Q9 |o|-|\- A P PROCESSlNG PROCESSINGPROCESSING ELEMENT ELEMENT ELEMENT PROGRAM |ao MEMORY fifif PROCESSING150 UNIT I I UNIT DATA MEMORY mo l 145 FIG. 2

MEMORY few I ARITUImTETIC AOL-L CONTROL L 230 6.0. BERGLA/VD INVENTORSDE' erg ATTORNEY PATENTEDHAY 9 I972 SHEET 5 OF 6 2 NUMBER OF PARALLELCOMPUTING ELEMENTS FIG. [2

Z NUMBER OF PARALLEL COMPUTING ELEMENTS PRTENTEDIM 9 I972 3. 662 161SHEET 6 BF 6 FIG. /3

CLEAR REGISTERS SET k =O SET K50 l SET EACH MULTIPLIER READ INPUT SAMPLENO PE oRM YES MULTIPLICATIONS K: k, I

ADD PRODUCT TO SUM IN RESPECTIVE k th REGISTERS I YES PERFORM TWIDDLEMULTIPLICATION i REORDERJ PERFORM r: r -POINT FOURIER TRANSFORMS Thisinvention relates to methods and apparatus for signal processing. Moreparticularly, this invention relates to methods and apparatus for thefrequency analysis of data signals. Still more particularly, thisinvention relates to methods and apparatus for performing fast Fouriertransforms using parallel processing techniques.

BACKGROUND OF THE INVENTION Machine methods for frequency analysis andsynthesis of signals using Fourier series and integral techniques havelong been important areas of scientific and engineering investigation.In recent years there have been developed improved means and methods forperforming such analyses and syntheses. Among these improved techniquesare included those known collectively as fast Fourier transform (FFI)techniques. These FFT techniques originated in recent history with apaper titled An Algorithm for the Machine Calculation of Complex FourierSeries, by J. W. Cooley and J. W. Tukey, Mathematics of Computation,Vol. 19, April 1965, pp. 297-301. The computational advantagesdemonstrated by this paper have spurred research in areas previouslyfelt to be beyond economic feasibility. These advantages, ofteninvolving computational savings of time and machine complexity of anorder of magnitude or more compared with classical techniques, flowlargely from judicious groupings and reorganizations of summationtechniques known in the prior art.

Numerous improvements and variations of the original FFT techniques havebeen developed since the publication of the original Cooley-Tukey paper,several of which were summarized in the June, 1967 and June, 1969 issuesof the IEEE Transactions on Audio and Electroacoustics. Other importantdevelopments have been disclosed, for example, in U.S. patentapplications by M. J. Gilmartin et al., Ser. No. 605,768, filed Dec. 29,1966 now U.S. Pat. No. 3,517,173 issued June 23, 1970 and G. D. Berglandet al., Ser. No. 605,791, filed Dec. 29, 1966 now U.S. Pat. No.3,544,775 issued Dec. 1, 1970. Another reference that should provehelpful in understanding the present invention in light of the prior artis one by R. Klahn and R. R. Shively, FFT Shortcut to Fourier Analysis,Electronics, Vol. 41, No.8, Apr. 15, 1968, pp. 124-129.

Many data processing applications require identical operations onmultiple sets of data. Most present digital computers accomplish thisthrough the use of a single processor that operates on one data set at atime. When the number of data sets is large, however, even the fastestof these computers is too slow to perform its task in a reasonableamount of time. The need for more efficient bulk" processing hascontributed to recent interest in multiprocessing with highly parallelprocessors. See, for example, S. H. Unger, A Computer Oriented TowardsSpatial Problems," Proc. IRE, Vol. 46, Oct. 1958, pp. 1744-1750; J. H.Holland, A Universal Computer Capable of Executing an Arbitrary Numberof Subprograms Simultaneously, Proc. Eastern Joint Comp. Conf., Boston,Mass, Dec. 1-3, 1959, p. 108; J. Gregory, and R. McReynolds, The SOLOMONComputer, IEEE Trans. on Electronic Computers, Vol. 12, Dec. 1963, pp.774-781; W. T. Comfort, Highly Parallel Machines, Proc. 1962 Workshop onComputer Organization, Spartan Books, Washington, D. C., 1963, p. 126.

Such machines obtain higher processing rates through the use of a largenumber of identical or similar processing units that operatesimultaneously or in an overlapping mode, each on its own part of theoverall task.

Other particularly useful parallel processing machines have beendescribed in B. A. Crane et al., Bulk Processing in Distributed LogicMemory," IEEE Trans. on Electronic Computers, Vol. 14, April 1965,pp.l86-196;J. A. Githens,Highly- Parallel Calculating Arrays in Radar DataProcessing, presented at the Workshop on the Development of New ComputerOrganizations, La Jolla, Calif, June 29, 1967; J. H. Huttenhoff and R.R. Shively, Arithmetic Unit of Computing Element in a Global,Highly-Parallel Computer," IEEE Trans. on Electronic Computers, Vol.C-l8, No. 8, August, 1969; and in B. A. Crane et al. U.S. Pat. Nos.3,376,555 and 3,391,390 issued Apr..2, 1968 and July 2, 1968,respectively.

BRIEF DESCRIPTION The present invention recognizes and applies theprinciples of Fourier transform technology as implemented using selectedaspects of the parallel data processing arts, with suitable extensionsand modifications. In particular, means are provided in the presentinvention for decimating a sequence of input samples into a number ofsubsequences. Each of these subsequences is then processed in anindependent computational element, thereby generating the Fouriertransform for each subset taken alone. The parallel processing elementsmay take any one of several well-recognized forms, and each is under thecontrol of a global control unit. Little or no communication or exchangeof information is required between the several processing elements.

The results of the fourier transformation on the subsequences are thenintegrated in a subsequent processing step to generate the .desiredtransform corresponding to the initial input sequence.

By suitably designing the arrangement of each processing element and bycontrolling the flow of data to and from these elements, it is possibleto reduce the number of fundamental arithmetic operations, therebyincreasing the speed of Fourier transform data processing. Additionally,the algorithm associated with one aspect of the present inventionpermits simplification of the format of the generated results, therebysimplifying interaction with the system user and other portions of acomprehensive data processing facility. In particular, by suitableintermediate processing, it is possible to sequentially read the resultsfrom each successive element of a parallel processing array whileretaining the proper order of Fourier coefficients.

BRIEF DESCRIPTION OF THE DRAWINGS The several aspects of the presentinvention will be more fully described below in connection with theseveral figures wherein:

FIG 1 shows a block diagram of a parallel data processing systemsuitable for performing fast Fourier transform analysis in accordancewith the present invention;

FIG. 2 shows a computational element for use in the parallel processingarray of FIG. 1;

FIGS. 3 through 9 show the flow of data and the storage patternsoccurring during an FFT computation using the system of FIG. 1 for thespecial case where the number of processing elements is two;

FIG. 10 shows the formation of certain results for the general casewhere the number of computational elements is arbitrary;

FIG. 11 shows the number of complex multiplications required for variousnumbers of computational elements; and

FIG. 12 shows the number of complex multiplications required for thespecial cases where the number of computational elements is l and 2;

FIG. 13 is a flow chart corresponding to one embodiment of the presentinvention.

NOTATION AND NOMENCLATURE The detailed description to follow arepresented largely in terms of algorithms. These algorithmic descriptionsare the means used by those skilled in the data processing art to mosteffectively convey the substance and meaning of their work to othersskilled in the data processing arts.

An algorithm is here, and generally, conceived to be a sequence of stepsleading to a desired result. These steps are those requiring physicalmanipulations of physical quantities. Usually, though not necessarily,these quantities take the form of electrical or magnetic signals capableof being stored, transferred, combined, compared, and otherwisemanipulated. It proves convenient at times, principally for reasons ofcommon usage, to refer to these signals as samples, values, elements,terms, real-valued quantities, complex-valued quantities, number, or thelike. It should be borne in mind, however, that all of these and similarterms are to be associated with the appropriate physical quantities andare merely convenient labels applied to these quantities.

Further, the manipulations performed are often referred to in terms,such as adding, which are commonly associated with mental operationsperformed by a human being. This is not the sense in which terms such asadding are used here. No human operator is necessary (or desirable inmost cases) in any of the operations described herein; the operationsare machine operations.

Useful machines for performing part or all of the operations of thepresent invention include general purpose computers of the IBM 7090/94class, various of the IBM System 360 class, the GE-600 class, or othersimilar machines. In all cases there should be borne in mind thedistinction between the method operations in operating a computer andthe (mathematical) method of computation itself. The present inventionrelates to method steps for operating a computer in processingelectrical or other (e.g., mechanical, chemical) physical signals togenerate other desired physical signals. The present invention alsorelates to apparatus for performing these operations.

The algorithms presented herein are not inherently related to anyparticular computer or other apparatus. In particular, various generalpurpose machines including those mentioned above, may be used, or it mayprove more convenient to construct more specialized apparatus to performthe required method steps. The required structure for various of thesemachines will appear from the description given below in light of thestate of presently existing knowledge in the field ofparallel-processing computers.

DETAILED DESCRIPTION Overview FIG. 1 is a generalized block diagram of aparallel data processing system for use in computing Fouriercoefficients. Shown there is a plurality of parallel processing elements101-1 through llq, where q is an arbitrary positive integer. Also shownis an input source 110 for supplying sequences of N digital signals(derived by well-known sampling techniques from continuous signals,where appropriate) for which it is desired that Fourier coefficients becalculated. These input digital signals, which are typicallyrepresentative of radar returns, speech or similar physical phenomena,are delivered to the processing elements 101-1 through l0l-q (in amanner to be described below) by way of multipliers 120-1 through I20-q,respectively.

Also shown in FIG. 1 is an ensemble control unit 130 which exercisesglobal control over the several processing elements 101-1 to l01-Control unit 130 is arranged to alter the sequence of operations in eachof the processing elements by means of selecting or altering a stored orwired program in each of the processing elements or by any of severalwellknown techniques. Also shown in FIG. 1 is a conventional processingapparatus 140 which may, for example, be a general purpose dataprocessing system having the usual arithmetic and logical capabilitiesrepresented by processor 145 and having provision for storing programdata signals and other nonprogram data signals. The apparatus forstoring these latter two classes of signals are designated by theidentification numerals 150 and 160, respectively. This separation ofthe various aspects of the processing apparatus 140 is for convenienceof discussion only. In actual implementations data processing apparatus140 may, for example, take the form of any of several of the IBM System360/ class machines or other similar machines including G.E. 600-seriesmachines.

Processing Elements FIG. 2 shows a parallel processing element accordingto one embodiment of the present invention. Shown there is an arithmeticunit 210, a memory 220 and a control unit 230. The actual distributionof the circuitry to perform the required arithmetic operations will takeany of several well recognized configuration depending, for example, onsuch things as the required size of memory 220, the required speed ofoperation, and the number of parallel processing elements used. Inparticular, it is possible and may be desirable to integrate memory withthe arithmetic and control aspects of a computational element asdescribed, for example in B. A. Crane, et al., "Bulk Processing inDistributed Logic Memory," IEEE Transactions on Electronic Computers,Vol. 14, April 1965, pp. 186-196; and in the Githens reference, supra.The processing element may also conveniently take the form described inthe Huttenhoff and Shively reference, supra. This latter reference isalso illustrative of a suitable parallel processing environment for thepresent invention, and is especially helpful with regard to describingthe control of the several processing units.

The arithmetic unit 210 may of course assume a more specialized form. Inparticular, it may be convenient to use the modularized equipmentarrangements described in pending US. patent applications by G. D.Bergland et al., Ser. No. 605,791 and M. J. Gilmartin, Jr. et al., Ser.No. 605,768 both filed Dec. 29, 1966 and assigned to the assignee of thepresent application. Also, in appropriate cases the memory units 220 and230 shown in FIG. 2 may be integrated with the arithmetic unit in thearrangement of Bergland et al., or in accordance with computationalelements of the type described in the Crane et a1. journal reference,supra. In any event, a further disclosure of the details ofcomputational elements 10l-i, i= 1, 2 q, is not essential to a full andcomplete understanding of the present invention.

FFT Fundamentals Arithmetic units Processing elements 101-1', i= 1,2,,q, may be arranged to perform any of the now well-known alternateversions of the fast Fourier transform. A useful description of theunderlying principles of the FFT may be found in the Bergland et al.reference, supra. Only a brief description of the well-known aspects ofthe FFT will be given here.

By way of review, it is well to consider that the calculation of Fouriercoefficients corresponding to a sequence of N input signals spaced overan interval of T seconds may be represented by UU j 7 r 7N 1 =0 where We and A(k) represents the input sample sequence 14(0), 14(1),. .A(N-1).

The Cooley-Tukey algorithm reduces equation (1 to a recursive relation,the exact form of which depends on the properties of N. Starting with aninitial list of numbers say b (the input sequence), subsequent lists bare calculated, the final one of which comprises the required Fouriercoefficients. When N is taken equal to 2'", m lists will be calculatedin accordance with one version of the algorithm. Each of the N entriesin a list is identified by an m-digit binary integer. For example, oneentry takes the form b,,(j I1 ,j,, k,,, k,,) where each of the js and ksare binary integers.

For the special case mentioned, N 2", the recursive relation takes theform 8 2mm Up 2p1+ 2pZ+ +j 2 nr-n The binary number specified by theargument j j ,k can be written simply as k, i.e., b,,(j ,j,, 1v 0 As maybe seen from equation (2), a highly regular pattern is followed incalculating the terms in the current list. For purposes of constructingthe list b,, the list h is considered to be portioned into twoequal-length sublists or parts. Each term b,(k) in the list 12, isderived from two terms from the list b One of these terms from b ismultiplied by the complex phasor W(in-1 +ip-2 +s0) where, as before, W=e The first term of b, consists of the first term in b plus the firstterm in the second half of b,, multiplied by Likewise the second term ofb, is made up of the second term in b, plus the second term in secondhalf of b multiplied by the same complex value. This process iscontinued for all terms in the first half ofi b,. To obtain the terms inthe second half of b,, a similar scheme is followed except that theterms in the second half are multiplied by e ==l+j0.

The Cooley-Tukey algorithm proceeds by transforming the list b, into athird list 12 and so on. The same type of operations apply in each case,except that after each iteration the list is effectively partitionedinto twice as many parts as before. Terms from selected parts of thethen-current list are used to form the required terms for the list thenbeing constructed.

The computational operations indicated in the various equations recitedherein are each fundamental arithmetic operations which are performedusing well-known apparatus. For example, adders are used to add,multipliers to multiply, and so forth. The generation of the complexexponential factors is accomplished using appropriate combinations ofsine and cosine function values which may be stored for reference orwhich may be generated as they are required by well-known functiongenerators.

Particularly simple and efficient generation of the required exponentialfactors may be effected using the techniques described in R. C.Singleton On Computing the Fast Fourier Transform," Comm. ACM, Vol. l0,pp. 647-654, October I967.

Special Case of Two Processing Elements A simplified version of thepresent invention may be understood from a consideration of FIG. 3 andthe discussion of it given below. This view shows a portion ofa parallelprocessor in accordance with FIG. 1 generally, but being restricted tothe case where q 2. Thus, in FIG. 3 there are but two parallelprocessing elements of the form shown generally in FIG. 2 and describedabove. These are designated 301-1 and 301-2, respectively.

The exemplary case to be considered is that where the input sequence Ak)contains but 16 input samples, i.e., N =16. These samples are shown atthe upper left portion of FIG. 3 as 14(0), A( l A( l5), reading fromright to left. Multipliers 320-1 and 320-2 are arranged for this simpleexample to multiply the input sequences applied to them by plus I orminus 1 (in a manner to be described below) before passing them on to Itheir respective processing elements l'1 and 301-2.

The processing elements 301-1 and 301-2 are required to have only eightdata storage locations designated 0 through 7. These are typicallylocated in that portion of the computational element shown as memory 220in FIG. 2. There may, of course, be other information storing portionsof the computational elements corresponding to the arithmetic andcontrol portions of a computational element e.g., those portions whichare designated by the identification numerals 210 and 230 in FIG. 2. Thecomputational elements shown in FIG. 3 are each arranged to calculate aneight-point Fourier transform.

In FIG. 3 (as in FIG. 1) the input sequence is shown as being broadcastto both (all) of the computational elements 301-1 and 301-2. Inaccordance with the algorithm of the present invention for the case N16, the first eight data elements in the sequence A(k), that is, A(0),A( l 14(7), are weighted or multiplied by +l and are stored in both ofthe computational elements as shown in FIG. 4. The remaining eight datasignals A(8), A(9), ,A( 15) are then weighted by +land applied toelement 301-1 and by *1 as they enter element 301-2. These numbers aresuccessively algebraically added to the previous contents of memorylocations to form the sums shown in FIG. 5.

It will be recognized that the contents stored in the correspondinglocations of the computational elements 301-1 and 301-2 represent the(two-point) Fourier transform coefficients corresponding to the pairs ofinput data values 14(0) and A(8), A( 1), and 14(9),. A(7) and A( 15).

Before proceeding to the completion of the desired l6point transform, itis well to recall that the equations for one version of the FFI with Nr,r are given by r,-l; m =e 21ri/s. In the present case r, 2 and r,=8.Thus we have j 7 1 k M0 k kzz) 12540 k 21ri/2)" l m 1e h m a 0 Theexpression in square brackets (for k 0,1, 7) is actually a sequence ofeight two-point transforms and is precisely the result calculated by themethod described above and stored as shown in FIG. 5. These terms arethose designated A,(i,,,,' k,,) in connection with equation (3) above.The contents of memory shown in FIG. 5 may therefore be represented inthe manner shown in FIG. 6. Equation (3) therefore reduces to It will beobserved that the multiplication of the A, terms by the exponentialfactor (e to form the bracketed quality in equation (5) is merely arereferencing ofA, terms in the familiar manner associated with the fastFourier transform.

This rereferencing is accomplished by well-known internal arithmeticoperations by arithmetic unit 210 in FIG. 2 and the results are as shownin FIG. 7. It proves convenient to relabel the rereferenced A, terms asA, so that equation (3) becomes, for the present case For conveniencethe A l, terms are referred to as shown in FIG. 8. It will be observedthat the operations indicated in equation (6) are precisely those of aconventional Fourier transform taken over the two eight-point sets of.4, terms stored in the computational elements as shown in FIG. 8. Thesetwo eightpoint Fourier transforms are performed identically in the twocomputational elements, with only the data sequences operated on beingdifferent. This transformation may profitably be performed usingwell-known FFT techniques. FIG. 9 shows the storage pattern of theresults of the transformation of the contents stored in the memoryportion of each of the computation elements shown in FIG. 8. The transformed quantities corresponding to the A, terms are referred to as A(j,,, j,). It should be noted that X(j,, j =A (j j,), or,

It should be noted that if in-place reordering of the A, terms in eachprocessing element is accomplished according to wellknown techniquesprior to transformation, the X terms may be read out from the entiresystem shown in FIG. 1 (with q 2) by alternately reading first fromcomputational element 101-1 and then from computation element 101-2while proceeding in sequence through the respective memories. Thus, thesometimes confusing and difficult-to-implement reordering of the Fouriercoefiicients is readily simplified by the present organization. If thereordering of A, terms is not accomplished prior to transformation, theX terms will require the usual reordering.

Another useful feature of the present invention is that relating to thegeneration of the required exponential (W) signals required in forming,for example, the expressions shown in FIG. 7. Processing element ll-2 isshown to require eight different Wterms, i.e., W, i= 0,1, 7. By notingthat for arbitrary m and n, W'"*" W"W", it is clear that only W need bestored explicitly on computational element l0l-2. The remaining W termsmay then be calculated by simple multiplication as required, e.g., W?may be calculated by multiplying W by W, and so forth.

Alternately, the techniques described in Singleton, supra, may be used.Still another simplified variation of these exponential functiongenerating techniques is one described by G. Sande at the IEEE Workshopof FFT processing at Arden House, Oct. 1968. There, Sande proposedcalculating subsequent values from previous ones according to(W"*W'")/W"= W- 1 or W"*= W"(Wl)+ W. It is said that this particularrearrangement improves roundoff errors in computation. It is oftenhelpful, especially for large values of K, to occasionally insert moreaccurate, separately calculated W values to prevent the accumulation ofroundoff errors. The Algorithm Generally With the above notation and thedescription of the special case of a two-processor algorithm as abackground, the general algorithm will now be presented. Briefly stated,the algorithm (illustrated in FIG. for N r,r comprises the steps of l.Dividing the input sequence {A(k)} ={A(k,,k,,)} {A(K, r,+ k,,)} fork=0,1,. N, k, =O,1,.. (r,l and k =0,l, (r l); into r, subsequences withk, fixed within each sequence. That is, let the first subsequent includeonly the first r elements, the second subsequence only the next relements, and so forth.

2. Multiplying the terms of the k, th subsequence by W,, j,,=0,l,. (r,=lto form sequences of product terms {A(k,, k )W ,j,,= 0,1,. (r,l for eachk,= 0,1, (r,- l Here, W, eni/rl. The multipliers 701-1 through 70l-i r,are conveniently adjusted for each new It,.

3. Within initially cleared registers in each of r, processing elements,combining corresponding product terms formed in step (2) for successivek, s to form Only N=r,r storage locations are required to form the A,terms because the previous contents of each location may be updated byappropriate product term formed in step (2), as k, varies over itsrange. That is, the A, terms may be summed as shown by accumulatingpartial sums. I

4. Multiplying A,(i ,k terms by W, for each j, and k to formcorresponding referenced terms (2,0 k,,) and replacing the A, terms bythe corresponding A, terms.

5. Forming the Fourier coefficients of the sets of xi, terms formed instep (4) and storing these coefficients in the locations previouslyoccupied by the A, terms. Again, any standard FET technique may beemployed, or, non-FET technique may be used.

6. Reading the Fourier coefficients corresponding to the original inputsequence by reading successively terms stored 'in each computationalelement starting with the first. In each case terms read from eachcomputational element are conveniently read from successive storagelocations. The first coefficient is read from the first location in thefirst computational element, the next from the first location in thesecond computational element, and so forth.

It should be understood that the Fourier transformation performed atstep 5 is conveniently, though not necessarily, performed using a fastFourier technique. This method is particularly attractive in the presentarrangement because no communication between computational elements isthen required in performing step (5). When an FFT process is employed instep (5) an in-place reordering according to well-known techniques isconveniently effected prior to transformation so that nopart-transformation reordering is required prior to readout.

The above procedures insure that the Fourier coefficients read frommemory in accordance with step 6 will be in ascending order, i.e., noreordering of the coefficients generated is required.

The fact that no communication is required between the variouscomputational elements in performing step (5) of the algorithm and thefact that this step may be executed in parallel, greatly increases thenumber of operations that can be performed in a unit time interval.

The number of global complex multiplications M that are required for thecase where N =r, r 2"2 is given by M 2 2" 2". It is this expressionwhich must be minimized to obtain the minimum number of computationsrequired. it can readily be shown that with the constraint that a2-point transform is to be performed in each computational element that,in general, the number of computations required decreases with anincreasing number of computational elements. As shown in FIG. 11,however, the incremental decrease in the number of computations requireddecreases sharply as p increase. In particular, when p=1 and p=2 the W,

terms assume only the values i 1 and i In this special case onlyadditions and conjugations (but no multiplications) are performed on theoriginal input signals. Thus, the number of complex multiplicationsreduces to q2 2" which is plotted in FIG. 12. It is shown there that p=2(four computational elements) is a good choice which actually givesbetter reducer than F5 for up to 2 input data samples. System ControlWhile the computational procedures required by the present invention aredetailed above, precise circuit arrangements have not been included.Thus, for example, the particular logic circuitry required to gate theinput samples to the appropriate registers has not been shown.Similarly, the details of the multipliers (e.g., 120-1 to 120-q, inFIG. 1) have been omitted. This has not been by oversight, but rather bydesign.

Particular practitioners may wish to use preexisting facilities to carryout one or more of the fundamental operations involved or to coordinateone or more of these operations. The following discussion, takentogether with the statement of the algorithm and other materials above,is sufficient to enable one to practice the instant invention in any ofa number of particular configurations. It will be assumed in thisdiscussion that program memory 150 in FIG. 1 is arranged to store acontrol program corresponding to the flow chart given in FIG. 13.Processing unit 145 is arranged to execute this program with referenceto data (including stored trigonometric values, where appropriate)stored in data memory 160 shown in FIG. 1. Ensemble control unit 130 isthen responsive to the program execution to actually close theappropriate switches, clear the appropriate registers, and so forth,that may appear in particular logic configurations. In keeping with thestatement of the algorithm in the last section above, N will again beassumed to be equal to r,r

Referring to FIGS. 1 and 13 then, computer 140 causes all of the datastorage registers in the processing elements 101-] to 101-r (recall q=rhere to be cleared or reset to zero. An initial value of zero is thenset for k and k These values in turn dictate the initial exponentialsignals to be supplied to (or be selected in) multipliers 120-1 to 120-rThese and laterrequired values may be stored in memory 160 or, asindicated parenthetically above, may be stored in a limited memorylocated at the multipliers. Alternately, the required multiplyingfactors may be generated as required according to one of the techniquesgiven earlier.

When the required multiplier signals are at hand at the multipliers l201to 1204 the input signals are read one at a time in serial form andpresented to each of the multipliers. Each such signal is thenmultiplied by the multiplier signal and the result combined with theprevious signals stored in the first register of the respectivecomputational element. Because with k, =0 the multiplier W, is l foreachj the first input sample will be merely stored in the first registerof each processing element. It will be recalled that these registerswere initially cleared, so no prior nonzero sum remains to be combinedwith this input sample.

Next, k is incremented and, since r is assumed not to be equal to 1,another sample is read, multiplied by l and stored in the respectiveregister in each processing element. This procedure is repeated until krrl, i.e., until the first subsequence of r input terms has been read.Then k is incremented, the multiplier values adjusted accordingly, andthe next input sample value is read (with k being reset to 0).

Again the multiplications are performed by each of the multipliers oneach input sample. Now, however, the multipliers are not all 1 as in thecase of the first subsequence, i.e., when k, 0. Thus, themultiplications are not degenerate and the respective products are addedto the existing values stored in the first register of the respectiveprocessing elements.

This process is then repeated until the second subsequence of k, inputvalues is read, multiplied, the products added to previous accumulatedsums, and the new sums stored in appropriate registers in the processingelements. Then k is incremented, the next k elements are read and eachof the partial sums augmented the products formed. When k is equal to rl, the sums indicated in FIG. 10 are complete.

Each of the accumulated sums in each of the processing element registersis then multiplied by the appropriate twiddle factor. This may beaccomplished in a convenient manner by supplying the twiddle factors inthe same manner as were the W,, multipliers, and actually performing themultiplication in the multipliers 120-1 to 120r,.

If an FFT is to be used in further processing the twiddled values, itmay be convenient to perform a standard (digits reversed) reordering asmentioned earlier. This is shown (as a dotted block) in FIG. 13. The r-point Fourier transformation of the contents of each of the rprocessing elements is then performed.

The results are then ready to be read in order from the respectiveprocessing elements unless an FFT was used without preprocessingreordering. In this case, the reordering of the results in eachprocessing element may be performed prior to or concurrent with readout.In any event, the desired results are stored in the processing elementsin a predictable order.

While the above description has proceeded primarily in terms of aplurality of specially designed FFT processing elements (includingcircuitry of the type described in the Bergland et al. reference, supra,for example), it should be understood that these processing elements canconveniently take the form of general purpose data processors programmedto perform the required individual transformations and other indicatedoperations. Further, it is clear that the multipliers and processingelements described as separate elements may be suitably combined whereappropriate.

What is claimed is:

1. Apparatus for generating data signals representing Fouriercoefficients corresponding to a sequence of N =r,r input data signals,where r, and r are positive integers with r 2, comprising means forgenerating trigonometric function values,

first means for forming r sets of r intermediate coefficients, eachintermediate coefficient corresponding to the sum of the products ofselected ones of said input data signals with selected trigonometricfunction values, and

second means of simultaneously generating r sets of output data signalsrepresentative of the Fourier coefficients for each respective set ofsaid intermediate coefficients.

2. Apparatus according to claim 1 wherein said first means comprisesmeans for s segmenting said sequence of input data signals into rsubsequences, each including r data signals, and summing means forforming the sums of selected ones of said products.

3. Apparatus according to claim 2 wherein said summing means comprises rsets of r registers and means for adding a subsequent one of saidproducts to a previously accumulated partial sum of said products, saidregisters being initially cleared.

4. Apparatus according to claim 3 wherein said first means furthercomprises means for forming the product of said sums and correspondingtrigonometric function values, thereby to form rereferenced sums.

5. Apparatus according to claim 1 wherein said second means comprises rfast Fourier transform processors for operating on each of said r, setsof intermediate coefficients.

6. Apparatus for generating data signals representing F ouriercoefficients corresponding to a sequence of N r,r input data signalscomprising A. a source of trigonometric signals,

B. a plurality of multipliers for forming product signals representingthe product of selected ones of said input data signals withselectedtrigonometric signals,

C. a plurality of processing elements, each including 1. means forforming sum-of-product signals representing accumulated sums of selectedones of said product signals,

2. means for forming signals representing the product of each of saidsum-of-product signals and a corresponding trigonometric signal to formrereferenced signals,

D. means for forming signals representing the Fourier coefficientscorresponding to said rereferenced signals.

7. Apparatus according to claim 6 further including in each of saidprocessing elements means for reordering said rereferenced signals.

8. The machine method for generating signals representing the Fouriercoefficients corresponding to a set of N=r,r ordered input signals, rand r being positive integers with r, 2 comprising the steps of A.generating r sets of r intermediate signals representing the sum ofproducts of said input samples with selected trigonometric functionsignals, and

B. generating simultaneously a set of signals representing the Fouriercoefficients corresponding to each of said sets of intermediate signals.

9. The method of claim 8 further comprising the step of multiplicitysaid intermediate signals by a rereferencing trigonometric functionsignal prior to performing step (B).

10. The method of claim 9 wherein said step (B) comprises the step ofperforming a fast Fourier transformation based on said intermediatesignals.

11. The method of claim 10 further comprising the step of reordering thesignals formed in accordance with the process of claim 9 prior toperforming said fast Fourier transformatron.

12. In a digital processor, the machine method of generating signalsrepresenting Fourier coefficients corresponding to a sequence of N= r rinput signals A(k) A(k,r +k k,= 0,1, ,r,-1; k 0,1, ,r l comprising thesteps of l. generating first product signals by multiplying each A(k)by(e" "'l) forj =0,l,...,r,l,

2. forming sum signals by adding product signals formed in step 1 whichcorrespond to the same value of j and k 3. forming second productsignals by multiplying said sum signals by (e for corresponding valuesof j and k and 4. generating r sets of r Fourier coefficients, each setcorresponding to that set of r of said second product signalscorresponding to a fixed value ofj 13. The method of claim 12 whereinstep l comprises the steps of A. reading an input signal,

B. reading r stored values of (e "'l) l for the input value read at step(A), forj =0,1, r -1,

C. generating a signal corresponding to the product of the signal readat step (A) with each of the signals read at step (B),

D. repeating steps (A), (B), and (C) of each input signal.

14. The method of claim 12 wherein step l comprises the steps of A.reading an input signal B. generating values of (e "'l) for the inputvalue read at step (A) forj =0,l, r -l,

C. generating a signal corresponding to the product of the signal readat step (A) with each of the signals generated at step (B),

D. repeating steps (A), (B), and (C) for each input signal.

15. The method of claim 12 wherein step (2) comprises the steps of A.clearing each of N registers R0 k ),j 0,1, r,l,

k =O,l, r l, B. adding each of said first product signals as it isformed to the contents of the register having corresponding values of jand k 16. The method of claim 12 wherein each of said sets of r Fouriercoefficients are generated in parallel at substantially the same time.

17. The method of claim 16 wherein at step (4) each of said sets of rcoefficients are generated by performing a fast Fourier transform basedon those r second product signals corresponding to a fixed value of jUNITED STATES PATENT oFTTcE CERTIFICATE OF CCRRECTICN Patent No. 3, 61Dated May 9, 1972 Inventor(S) Glenn D. Bergland; Donald E. Wilson It iscertified that error appears in the above-identified patent and thatsaid Letters Patent are hereby corrected as shown below:

Column '2, line 1, after "of" insert --a-; line '22, change "fourier" to-Fourier; and line 67, change "description" to descriptions-. Column L,line 62, change 'b to I c o b Column 5 line 16, change i b l insert andline 39, change Column 6, Eq. (A), after to --A (j k Column 7, line 66,change H H t] k II 0 K r 1%)} to -{A(k r kO) line 71, change jOkln l to-W O l-; line 73 change "k )W to to W k )w Joki line 7 1 chan e "w' 2and delete "Within" to With,- line '12, after and line 21, change "FET"to FFT and change "non-FET" to -'nonFFT.

-; line 7 1 change 7-01 i" to -70l-r Column 8, line 1, change "by"insert the-;

Column 9, line 2M, change r here" to --(recall q r here)--; line 39,

O and line 69, change "W (recall q n "O 11! change W to 1 r line 33,after 'ior" delete s; and line 76, change "generating simultaneously" to-simultaneously generating. Column 11, line 1, change multiplicity" tomultiplying;

Column 10, line '29, change "means of" to --means for";

'ORM PO-1050 (10-69) USCOMM-DC suave-Poe I Q U.5, GOVERNMENT PRlNTlNGOFFICE! '95 0-365-334 CERTIFICATE OF CORRECTION '2 Inventors Glenn D.Bergland; Donald E. Wilson 1 J k to (e27T1/I'1) O 1 V 0 line 18, change"(e i/r 1) k I o o '27Ti O 0 line 22, change "(e I to /N) and k 2 o 12'ITi r o 1 line 31, change "(e l) to -(e 1) Column 12, line L, change'of" to for-; and line 9,

' k 2 o 1 change "(e l) to -(e l) Signed and sealed this l2th day ofDecember 1972.

(SEAL) Attest:

EDWARD M.FLETCHER,J'R. ROBERT GOTI'SCHALK Attesting Officer Commissionerof Patents

1. Apparatus for generating data signals representing Fouriercoefficients corresponding to a sequence of N r1r2 input data signals,where r1 and r2 are positive integers with r1>2, comprising means forgenerating trigonometric function values, first means for forming r1sets of r2 intermediate coefficients, each intermediate coefficientcorresponding to the sum of the products of selected ones of said inputdata signals with selected trigonometric function values, and secondmeans of simultaneously generating r1 sets of output data signalsrepresentative of the Fourier coefficients for each respective set ofsaid intermediate coefficients.
 2. means for forming signalsrepresenting the product of each of said sum-of-product signals and acorresponding trigonometric signal to form rereferenced signals, D.means for forming signals representing the Fourier coefficientscorresponding to said rereferenced signals.
 2. Apparatus according toclaim 1 wherein said first means comprises means for s segmenting saidsequence of input data signals into r1 subsequences, each including r2data signals, and summing means for forming the sums of selected ones ofsaid products.
 2. forming sum signals by adding product signals formedin step 1 which correspond to the same value of j0 and k0,
 3. formingsecond product signals by multiplying said sum signals by (e2 i/N) 0 0for corresponding values of j0 and k0, and
 3. Apparatus according toclaim 2 wherein said summing means comprises r1 sets of r2 registers andmeans for adding a subsequent one of said products to a previouslyaccumulated partial sum of said products, said registers being initiallycleared.
 4. generating r1 sets of r2 Fourier coefficients, each setcorresponding to that set of r2 of said second product signalscorresponding to a fixed value of j0.
 4. Apparatus according to claim 3wherein said first means further comprises means for forming the productof said sums and corresponding trigonometric function values, thereby toform rereferenced sums.
 5. Apparatus according to claim 1 wherein saidsecond means comprises r1 fast Fourier transform processors foroperating on each of said r1 sets of intermediate coefficients. 6.Apparatus for generating data signals representing Fourier coefficientscorresponding to a sequence of N r1r2 input data signals comprising A. asource of trigonometric signals, B. a plurality of multipliers forforming product signals representing the product of selected ones ofsaid input data signals with selected trigonomeTric signals, C. aplurality of processing elements, each including
 7. Apparatus accordingto claim 6 further including in each of said processing elements meansfor reordering said rereferenced signals.
 8. The machine method forgenerating signals representing the Fourier coefficients correspondingto a set of N r1r2 ordered input signals, r1 and r2 being positiveintegers with r1>2 comprising the steps of A. generating r1 sets of r2intermediate signals representing the sum of products of said inputsamples with selected trigonometric function signals, and B. generatingsimultaneously a set of signals representing the Fourier coefficientscorresponding to each of said sets of intermediate signals.
 9. Themethod of claim 8 further comprising the step of multiplicity saidintermediate signals by a rereferencing trigonometric function signalprior to performing step (B).
 10. The method of claim 9 wherein saidstep (B) comprises the step of performing a fast Fourier transformationbased on said intermediate signals.
 11. The method of claim 10 furthercomprising the step of reordering the signals formed in accordance withthe process of claim 9 prior to performing said fast Fouriertransformation.
 12. In a digital processor, the machine method ofgenerating signals representing Fourier coefficients corresponding to asequence of N r1r2 input signals A(k) A(k1r2+k0), k1 0,1, . . . , r1-1;k0 0,1, . . . , r2-1, comprising the steps of
 13. The method of claim 12wherein step (1) comprises the steps of A. reading an input signal, B.reading r1 stored values of (e2 i/r1) 0 1 for the input value read atstep (A), for j0 0,1, . . . , r1-1, C. generating a signal correspondingto the product of the signal read at step (A) with each of the signalsread at step (B), D. repeating steps (A), (B), and (C) of each inputsignal.
 14. The method of claim 12 wherein step (1) comprises the stepsof A. reading an input signal B. generating values of (e2 i/r1) 0 1 forthe input value read at step (A) for j0 0,1, . . . , r1-1, C. generatinga signal corresponding to the product of the signal read at step (A)with each of the signals generated at step (B), D. repeating steps (A),(B), and (C) for each input signal.
 15. The method of claim 12 whereinstep (2) compriseS the steps of A. clearing each of N registers R(j0,k0), j0 0,1, . . . , r1-1, k0 0,1, . . . , r2-1, B. adding each of saidfirst product signals as it is formed to the contents of the registerhaving corresponding values of j0 and k0.
 16. The method of claim 12wherein each of said sets of r2 Fourier coefficients are generated inparallel at substantially the same time.
 17. The method of claim 16wherein at step (4) each of said sets of r2 coefficients are generatedby performing a fast Fourier transform based on those r2 second productsignals corresponding to a fixed value of j0.